Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse almost-any sacct output #101

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

aowenson-imm
Copy link

@aowenson-imm aowenson-imm commented Dec 19, 2023

Implements #100

This enables parsing almost-any sacct output. Summary:

  • auto-detect header in input CSV and extract field names
  • flexible SchedulerJobInfo creation

SchedulerJobInfo:

  • support more fields: user, state, timelimit, ru_ttime
  • modify to_dict() to return all fields, move *_dt pruning into fields()

SchedulerLogParser:

  • do not initialise output CSV in __init__(). Instead, do when writing output
  • modify _get_job_field_names() to accept job as optional argument, from which it takes field names
  • replace _write_csv_header() with _init_csv_output()

SlurmLogParser:

  • auto-detect if input CSV has header, if yes replace SLURM_ACCT_FIELDS, and skip header during parsing
  • redo _create_job_from_job_fields() to handle almost-any sacct output and create a SchedulerJobInfo()
  • add parse_jobs_to_dict(), an alternative to file output (so I can feed direct into Pandas)

Fix handling of Slurm jobs with an epilog. For me, these are categorised as FAILED because only the epilog is COMPLETED. So retain the last State value, not first.

This enables parsing almost-any sacct output. Summary:
- auto-detect header in input CSV and extract field names
- flexible SchedulerJobInfo creation

SchedulerJobInfo:
- support more fields: user, state, timelimit, ru_ttime
- modify to_dict() to return all fields, move '*_dt' pruning into fields()

SchedulerLogParser:
- do not initialise output CSV in __init__(). Instead, do when writing output
- modify _get_job_field_names() to accept job as optional argument, from which it takes field names
- replace _write_csv_header() with _init_csv_output()

SlurmLogParser:
- auto-detect if input CSV has header, if yes replace SLURM_ACCT_FIELDS, and skip during parsing
- redo _create_job_from_job_fields() to handle almost-any 'sacct' output and create a SchedulerJobInfo()
- add parse_jobs_to_dict(), an alternative to file output

Fix handling of Slurm jobs with a prolog. For me, these are categorised as FAILED because only the prolog part is COMPLETED. So retain the last State value, not first.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant